De-serialize data back as Python objects using Apache Avro

Overview:

  • The Apache Avro is a framework used for data serialization and Remote Procedure Calls.
  • Apache Avro stores the schema of the data along with serialized data, which improves the performance of the entire serialization-deserialization process.
  • This article explains how to get Python objects back through de-serialization from the data file, which has the serialized data using Apache Avro.

De-serializing data into Python Objects:

  • Using DataFileReader create a reader object by passing the file object corresponding to the data file and the DatumReader object as parameters.
  • While the DataFileReader helps in reading the data file the DatumReader helps in de-serializing the data present in the file.
  • Remember, the data file consists of data and the scheme of the data.
  • The data can contain both primitive types and complex types.

Example:

# import the avro classes

from avro.datafile import DataFileReader

from avro.io import DatumReader

 

# Create the fileobject for the serialized data file

fileObject  = open("conference.avro", "rb")

 

# Read the file using DataFileReader and

# deserialize using DatumReader

dataReader  = DataFileReader(fileObject, DatumReader())

 

# Print the conference details

for  conferenceDeatil in dataReader:

    print(conferenceDeatil)

   

dataReader.close()

 

Output:

{'name': 'Virutal conference', 'date': 25612345, 'location': 'New York', 'speakers': ['Speaker1', 'Speaker2'], 'participants': ['Participant1', 'Participant2', 'Participant3', 'Participant4', 'Participant5'], 'seatingArrangement': {'Participant1': 1, 'Participant2': 2, 'Participant3': 3, 'Participant4': 4, 'Participant5': 5}}

 


Copyright 2024 © pythontic.com